Bark provides a variety of predefined speaker prompts:
- v2/en_speaker_0 through v2/en_speaker_9: English speakers
- v2/de_speaker_0 through v2/de_speaker_9: German speakers
- v2/es_speaker_0 through v2/es_speaker_9: Spanish speakers
- v2/fr_speaker_0 through v2/fr_speaker_9: French speakers
- v2/hi_speaker_0 through v2/hi_speaker_9: Hindi speakers
- v2/it_speaker_0 through v2/it_speaker_9: Italian speakers
- v2/ja_speaker_0 through v2/ja_speaker_9: Japanese speakers
- v2/ko_speaker_0 through v2/ko_speaker_9: Korean speakers
- v2/pl_speaker_0 through v2/pl_speaker_9: Polish speakers
- v2/pt_speaker_0 through v2/pt_speaker_9: Portuguese speakers
- v2/ru_speaker_0 through v2/ru_speaker_9: Russian speakers
- v2/tr_speaker_0 through v2/tr_speaker_9: Turkish speakers
- v2/zh_speaker_0 through v2/zh_speaker_9: Chinese speakers
Temperature Controls
# Complete control over model temps
{
"text_temp": 0.7,
"waveform_temp": 0.7,
"semantic_temp": 0.7,
"coarse_temp": 0.7,
"fine_temp": 0.7
}
Voice Cloning
# Generation with custom voice prompt
voice_preset = "v2/en_speaker_1"
audio_array = generate_audio(text, history_prompt=voice_preset)
Small Model CPU Performance
Using SUNO_USE_SMALL_MODELS=1
environment variable enables CPU-optimized inference:
- Model Load Time: ~4.3s on CPU
- Generation Speed: ~62s to generate 6s of audio
- Example Text: "In the light of the moon, a little egg lay on a leaf"
# Enable small models for CPU
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""
os.environ["SUNO_USE_SMALL_MODELS"] = "1"
from bark import generate_audio, preload_models, SAMPLE_RATE
# Load CPU-optimized models
preload_models() # Takes ~4.3s
# Generate audio
text = "In the light of the moon, a little egg lay on a leaf"
audio_array = generate_audio(text) # Takes ~62s for 6s audio
Note: Performance measured on a system with 10 CPU cores. Your results may vary based on hardware.